
\section{PS algorithm}
\secL{algorithm}
In this section we present PS, the algorithm that we propose to perform the Design Space Exploration (DSE). As any other DSE algorithm, its input is the parametrized system $S$ and a benchmark application $b$, while the output is an approximation of the Pareto-optimal set $\mathcal{P}\left(S,b\right)$.   
PS is iterative and we will describe, separately, the initialization phase, the operations performed inside each iteration and the condition that triggers the termination.

At a glance, at each iteration simulations are performed. Regions are classified based on the innovation introduced by their configurations in the Pareto-front so far calculated. The most interesting regions are split, so that each sub-region will receive more evaluation effort in the next iteration. On the contrary, uninteresting regions are merged.

By a slightly abuse of notation, considering a subset $I$ of the configuration space, we will use  $\mathcal{P}(I,b)$ to denote the non-dominated feasible configurations in $I$, i.e.
	\begin{align}
		\mathcal{P}(I,b)=
		\left\{ \mathbf{c} \in I \cap \mathcal{C}^*(S) | \nexists \ \mathbf{c}' \in I \cap \mathcal{C}^*(S), \mathbf{c}' \succ \mathbf{c} \right\}
	\end{align}
Since in each run of the algorithm we will consider only one benchmark application $b\in\mathcal{B}$, we will omit it, replacing, for example, $\mathcal{P}(I,b)$ with  $\mathcal{P}(I)$.

Each iteration is called \emph{era}. The $i$-th era is characterized by: 
	\begin{itemize}
	\item the set of regions $\mathcal{R}_{i}$ in which the configuration space
	$\mathcal{C}(S)=V_{1}\times\dots\times V_{n}$ is partitioned
	\item an approximation $\mathscr{P}_{i}$ of the Pareto-optimal set.
	\item a set $test_{i}$ of at most $K$ configurations evaluated
	in the era.
	\end{itemize}
$K$ is a parameter that must be set before the algorithm runs.
Note that $\mathcal{R}_{i}$ is a partition of the configuration space, i.e.
	\begin{align}\begin{array}{l}
		\bigcup_{R\in\mathcal{R}_i} R = \mathcal{C}(S) \mbox{ and}\\
		R' \cap R'' = \emptyset, \forall R',R'' \in \mathcal{R}_i
	\end{array}\end{align}

We now formally describe PS.

\subsection{Initialization}
We fix $\mathcal{R}_0$, $\mathscr{P}_0$ and $test_0$ for the era $0$ as follows. Era $0$ has only one region that is the whole parameter space,
i.e. $\mathcal{R}_{0}=\left\{ V_{1}\times\dots\times V_{n}\right\} $.
A random set $test_{0}$ of $K$ configurations is evaluated. The
Pareto set $\mathscr{P}_{0}=\mathscr{P}\left(test_{0}\right)$ is
calculated.



\subsection{Iteration steps}
\secL{steps}

For $era_{i}$ with $i>0$ the following operations must be performed.
\begin{enumerate}
\item \label{pers02.enu:K}
In each era $i$, a set $test_i$ of $K$ configurations will be evaluated. We call them \emph{test configurations} and selecting them is exactly the goal of this step.
We select only \emph{new} configurations, i.e. the ones that have not been evaluated in past eras. In other words, $test_i$ must not overlap with $\bigcup_{j=0}^{i-1}test_{j}$.
Since all regions $R$ in the configuration space partition $\mathcal{R}_{i}$ must be properly explored, we have to distribute the test configurations among all these regions. For this reason, $\frac{K}{\left|\mathcal{R}_{i}\right|}$ configurations are randomly extracted from the new configurations of each region $R\in\mathcal{R}_{i}$. We use $test_{R,i}$ to denote the set of the selected configurations and, obviously, $\bigcup_{R\in\mathcal{R}_{i}} test_{R,i} = test_i$.
 In case the new configurations in $R$ are less then $\frac{K}{\left|\mathcal{R}_{i}\right|}$, all of them are selected. In this case $test_{R,i}=R\setminus\bigcup_{j=0}^{i-1}test_{j}$, where  $\bigcup_{j=0}^{i-1}test_{j}$ is the set of the configurations already evaluated in the past.\footnote{Note that if such regions exist, the total number of evaluations in era $i$ will be less than $K$}
\item All configurations on $test_{i}$ are simulated.
\item The Pareto set $\mathscr{P}_{i}$ for $era_{i}$ is defined as the set of non-dominated configurations among the ones evaluated so far, i.e.
	\[
	\mathscr{P}_{i} \triangleq
	\mathscr{P}\left(\bigcup_{j\leq i}test_j\right)=
	\mathscr{P}\left(test_{i}\cup\mathscr{P}_{i-1}\right)
	\]
where the last equality can easily be proved.

\item \label{pers02.enu:novelty_score_of_a_configuration}
The test configurations that are dominated, i.e. do not belong to $\mathscr{P}_i$, are neglected, while an \emph{innovation score} is given to the other ones, computed on the base of their separation from the non-dominated configurations of the past eras. Recalling that $s$ is the separation (see \defR{separation}), the innovation score of a non-dominated test configuration $\mathbf{c}^*$ is defined as
	\[
	is\left(\mathbf{c}^*\right)=\min\left\{ \left.s\left(\mathbf{c}^*\rightarrow\mathbf{c} \right)\right|\mathbf{c}\in\mathscr{P}_{i-1}\right\} 
	\]
In other words, a configuration $\mathbf{c}^*$ is characterized by as much innovation
as more separated it is from the configurations of the previous era.

\item \label{enu:region_innovation_score} Every region $R\in\mathcal{R}_{i}$ is given an innovation score defined
as the sum of the innovation scores of its non-dominated test configurations:
	\[
	is\left(R\right)=\sum\left\{ \left.is\left(\mathbf{c}\right)\right|\mathbf{c}\in\mathscr{P}_{i}\cap test_{R,i} \right\} 
	\]

\item The total innovation and average score for the entire era is calculated:
	\begin{align} \begin{array}{l}
	is_{TOT,i}=\sum_{R\in\mathcal{R}_{i}}is\left(R\right) \\
	is_{av,i}=is_{TOT,i} / |\mathcal{R}_i|
	\end{array} \end{align}
	

\item The configuration space partition $\mathcal{R}_{i}$ is divided in three subsets, that we will comment later:

	\begin{enumerate}
	\item \emph{high innovation regions}, whose innovation score exceeds the average score of the previous era by a factor of $\alpha$, i.e.
	\[
	\mathcal{R}_{i,h}=\left\{ \left.R\in\mathcal{R}_{i}\right|is\left(R\right)>\alpha\cdot is_{av,i-1}\right\} 
	\]
	 where $\alpha$ is a constant defined by the designer (for example
	$\alpha=1.2$). 	
	\item \emph{low innovation regions}, whose score is positive but modest, i.e.
	\[
	\mathcal{R}_{i,l}=\left\{ \left.R\in\mathcal{R}_{i}\right|0<is\left(R\right)\le\alpha\cdot is_{av,i-1}\right\} 
	\]
	\item \emph{no innovation regions}, whose innovation is null, i.e. 
	\[
	\mathcal{R}_{i,n}=\left\{ \left.R\in\mathcal{R}_{i}\right|is\left(R\right)=0\right\} 
	\]
	\end{enumerate}

\item The goal of this step is preparing the regions that will compose the configuration space partition of the next era, i.e. $\mathscr{R}_{i+1}$. The following operations are performed:
	\begin{enumerate}
		\item Each high innovation region $R\in\mathscr{R}_{i,h}$ is split to obtain $\psi(R)$, as stated in \defR{splitting-regions}. We use $\psi\left(\mathscr{R}_{i,h}\right)$ to denote the set of regions obtained as explained.
		\item Low innovation regions are left unchanged.
		\item No innovation regions are merged, two by two. To do so, pairs of no innovation contiguous regions $(R_1,R_2)$ are selected. We use $\mathscr{M}_i$ to represent the set of these pairs. This set is constructed such that pairs are independent, i.e. there cannot be two of them sharing a region. In other words, taking whatever $(R_1,R_2),(R_3,R_4) \in \mathscr{M}_i$, $R_i \neq R_j, \forall i,j\in\lbrace1,2,3,4\rbrace, i\neq j$. 
We merge each pair of these regions, i.e. $\forall (R_1,R_2)\in\mathscr{M}_i$ we calculate $\mu(R_1,R_2)$, according to \defR{merging-regions}.
We stress that the regions that belong to each selected pair must be contiguous, otherwise it is impossible to merge them. We use $\mu(\mathscr{M}_i)$ to denote the set of the merged regions, i.e. $\mu(\mathscr{M}_i) = \bigcup_{(R_1,R_2)\in\mathscr{M}_i} \mu(R_1,R_2)$
	\end{enumerate}


\item The set of regions $\mathcal{R}_{i+1}$ for the following era is composed of the sub-regions resulting from splitting operations ($\psi\left(\mathscr{R}_{i,h}\right)$), the low innovation regions unchanged $\mathcal{R}_{i,l}$, the result of the merging operations ($\mu(\mathscr{M}_i))$ and the no innovation regions that was not possible to merge $\left(\mathcal{R}_{i,n}\setminus\bigcup_{\left(R_{1},R_{2}\right)\in\mathscr{M}_i }\left\{ R_{1},R_{2}\right\} \right)$. Formally:
	\begin{align} 
	\mathcal{R}_{i+1} \triangleq
	\psi\left(\mathscr{R}_{i,h}\right) \cup \mathcal{R}_{i,l}\cup
	\left(\mathcal{R}_{i,n}\setminus\bigcup_{\left(R_{1},R_{2}\right)\in\mathscr{M}_i }\left\{ R_{1},R_{2}\right\} \right)\cup \mu(\mathscr{M}_i)
	\end{align}
\end{enumerate}

\begin{algorithm}[t]
	\small
	\caption{PS Algorithm}
	\label{alg:algo}
	  \SetKwInOut{Input}{Input}\SetKwInOut{Output}{Output}
	  \Input{S}
	  \Output{ $\tilde{\mathscr{P}}$ // Approximation of Pareto-set}
%
	  \nl //Initialization \\
	  \nl $\mathscr{R}_0\Leftarrow
				\left\{ V_{1}\times\dots\times V_{n}\right\} $; \\
	  \nl $\mathscr{P}_0\Leftarrow $ extract\_non\_dominated($test_0$);\\
	  \nl \\
	  \nl $i\Leftarrow 0$;\\
	  \nl \While{ !termination\_condition } {
			// Iteration steps\\
			\nl $test_i=\emptyset$;\\
	  		\nl \ForEach {$R \in \mathscr{R}_i$}{
				\nl $\mathscr{N}_{R,i} \Leftarrow$ configurations in $R$ not evaluated so far;\\
				\nl $|\mathscr{N}_{R,i}|\leq K/|\mathscr{R}_i|$? \\
						$\mbox{\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,}test_{R,i}\Leftarrow \mathscr{N}_{R,i}$:\\
						\mbox{\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,}$test_{R,i}\Leftarrow K/|\mathscr{R}_i|$ randomly selected configurations from $\mathscr{N}_{R,i}$;\\
			\nl $test_i\Leftarrow test_i \cup test_{R,i}$

			}

			\nl simulate ($test_i$);\\
			\nl $\mathscr{P}_i \Leftarrow 
				\mathscr{P}\left(test_{i}\cup\mathscr{P}_{i-1}\right)$;\\
			\nl \ForEach{$\mathbf{c}^*\in\mathscr{P}_i$} {
				\nl $is\left(\mathbf{c}^*\right) \Leftarrow \min\left\{ \left.s\left(\mathbf{c}\rightarrow\mathbf{c} \right)\right|\mathbf{c}\in\mathscr{P}_{i-1}\right\}$;\\
				}
			\nl \ForEach{$R\in\mathscr{R}_i$} {
				\nl $is\left(R\right)=\sum\left\{ \left.is\left(\mathbf{c}\right)\right|\mathbf{c}\in\mathscr{P}_{i}\cap test_{R,i} \right\} $;
			}

			\nl $is_{TOT,i}=\sum_{R\in\mathcal{R}_{i}}is\left(R\right)$ \\
			\nl $is_{av,i}=is_{TOT,i} / |\mathcal{R}_i|$\\
			\nl \\ //Region classification\\
			\nl $\mathcal{R}_{i,h}=\left\{ \left.R\in\mathcal{R}_{i}\right|is\left(R\right)>\alpha\cdot is_{av,i-1}\right\}$; \\
			\nl $\mathcal{R}_{i,l}=\left\{ \left.R\in\mathcal{R}_{i}\right|0<is\left(R\right)\le\alpha\cdot is_{av,i-1}\right\} $; \\
			\nl $\mathcal{R}_{i,n}=\left\{ \left.R\in\mathcal{R}_{i}\right|is\left(R\right)=0\right\} $;
			
			\nl \\ // Splitting and merging operations\\
			\nl $\psi(\mathscr{R}_{i,h}) \Leftarrow \lbrace 
					\psi(R) | R\in \mathscr{R}_{i,h}	
				\rbrace$; \\
			\nl construct $\mathscr{M}_i$; // the set of pair of contiguous no innovation regions \\
			\nl $\mu(\mathscr{M}_i) = \bigcup_{(R_1,R_2)\in\mathscr{M}_i} \mu(R_1,R_2)$;\\
			\nl\\ // Prepare the configuration space partition for the next iteration \\
			\nl $\mathcal{R}_{i+1} \triangleq
	\psi\left(\mathscr{R}_{i,h}\right) \cup \mathcal{R}_{i,l}\cup
	\left(\mathcal{R}_{i,n}\setminus\bigcup_{\left(R_{1},R_{2}\right)\in\mathscr{M}_i }\left\{ R_{1},R_{2}\right\} \right)\cup \mu(\mathscr{M}_i)$;\\
		}
	\nl $\tilde{\mathscr{P}} \Leftarrow \mathscr{P}_i$
\end{algorithm}

\subsection{Termination Condition}
The algorithm terminates at step $k$ if
\[
is_{TOT,k},is_{TOT,k-1},\dots,is_{TOT,k-\beta}\le\gamma\cdot is_{TOT,k-\left(\beta-1\right)}
\]
 where $\beta$ and $\gamma$ are selected by the experimenter. 
In other words, the algorithm terminates if, in the last
$\beta$ iterations, the ``amount of innovation'' has been
small. Therefore, it is very probably that carrying on iterating
will not considerably change the Pareto-front approximation already formed. 

It is not recommended to chose $\beta=1$, because it is possible that the
$k-1$-th and $k$-th Pareto front approximations are not very different but next ones could be. In other words,
it is unsafe to terminate as soon as the Pareto front does not considerably change
between two consecutive eras. For this reason, we take into
account the history of the Pareto fronts.

A pseudo-code representation of PS algorithm is in Alg.\ref{alg:algo}.
%!!! IN INGLESE E' PIU' CORRETTO DIRE: ``IT IS POSSIBLE THAT THE PARETO
%FRONTS WERE NOT VERY DIFFERENT''? (CON ``WERE'' AL POSTO DI ``ARE'')

\subsection{Exploration vs. Exploitation in PS}
The proposed algorithm has been designed in order to be inherently able to balance exploration and exploitation, which are ``two cornerstones of problem solving by search''~\cite{eiben1998evolutionary}. While a detailed survey on the exploration vs. exploitation trade-off can be found in~\cite{vcrepinvsek2013exploration}, here we limitedly recall that \emph{exploration} is the capability of an algorithm to consider different parts of the configuration space, not being confined only to a limited subset, and thus not being trapped in local optima. Following the terminology of \cite{weise2012evolutionary}, if the exploration is not sufficiently wide, the algorithm suffers from \emph{premature convergence}, i.e. it starts to analyze only a narrow portion of the configuration space, thus ignoring potential Pareto configurations that are outside this portion. When premature convergence happens, there is insufficient \emph{diversity}~\cite{weise2012evolutionary}, i.e. all the provided solutions are similar.
On the other hand, \emph{exploitation} is the capability to recognize and densely analyze the parts of configuration space that have many Pareto configurations.

While in genetic algorithms exploration and exploitation are guaranteed by mutation, crossover and selection, in our algorithm these guarantees are provided for the following reasons:
	\begin{itemize}
		\item Merging and splitting operations are dynamic processes. A portion
of the parameter space may be populated by a large number of small
regions in some eras, and thus a large number of test configurations. However, after having been sufficiently explored, the innovation score of these regions may decrease, they may be merged and therefore that portion of configuration space may be covered by a small number of large regions in next eras. This is a desirable property of PS, since, after a number of eras of intense exploration of a configuration space part, this exploration may turn out to be enough thorough and so it may be time for other parts to be minutely scanned. 
PS dynamics can be summarized as follows: while eras alternate, PS zooms out from the configurations subspaces already sufficiently scanned in the past, whereas it zooms in on subspaces which provide novelty (guaranteeing exploitation), meaning that they have not been properly analyzed yet (this helps exploration). 
		\item No portion of parameter space is neglected since, at each era, all the regions, even the least interesting ones, have the chance to be explored with new evaluated points (providing exploration), even though some regions are less densely explored than others.
	\end{itemize}

To summarize, thanks to the above characteristics, PS is able to recognize the regions where the best configurations are, while avoiding premature convergence, not being trapped in a narrow subset of configuration space.

%

\subsection{Algorithm Parameters and Scalability} The parameters that determine the algorithm are the configurations budget for each
era $K$ and the threshold factor $\alpha$. From a functional point of
view, $K$ is somewhat similar to the population size of a genetic
approach, determining how many configurations should be evaluated in the
subsequent step of the algorithm. However, as seen in the
previous subsection, the way this $K$ configurations are distributed
across the parameters space is completely different.
Authors of \cite{zitzler_ec00} found that
increasing $K$ not necessarily has a considerable impact on the
performance of the algorithm, and this holds also in our algorithm (see for example
Figure~\ref{fig:kimpact} and Subsection~\ref{sec:comparison}).
Nonetheless, while in the genetic approach $K$ must be sufficiently large
in order to guarantee the diversity of the Pareto set configurations, in
our algorithm the diversity is not impacted by $K$ since, by
construction, it is guaranteed by the way the configuration space is
partitioned, independently of the number of configurations evaluated at
each iteration. 

On the other side, parameter $\alpha$ determines when a region is considered of high
innovation. High values of $\alpha$ mean that a region will hardly be
elected as high-innovation, producing less splitting operations. This
slows down zooming in on portions of configuration space and makes the
distribution of the simulated configurations more uniform, since there are
fewer small regions with a high test configuration concentration. The
opposite happens when small values of $\alpha$ are used. In this
introductory work we are not explicitly focused on investigating how
these parameters could be fine tuned, nevertheless
we conducted some preliminary tests to confirm the relative stability
of $PS$ with different choices and to select some ``reasonable'' average
values to perform the design space exploration, as we will discuss
in~\secR{ee}.

With regard to the performance of the proposed approach, we observe it is related to the total number of
different configurations evaluated during the exploration. We can
safely make such assumption, because the process of evaluating a new
configuration is the time-dominating one among all the operations
performed. As we will show in the
next Section, this is especially true in the hardware/software
design space exploration scenario we are addressing, since benchmark
compilation and execution times
are orders of magnitude bigger than the splitting/merging operations and
innovation scores computations.  The $i$-th era would provide a new set
$test_i$ of $K$ configurations to be tested. However, we cannot know
a priori how many of
these will actually be new configurations, i.e. requiring a process of
evaluation. Thus, the number of total new simulations $N_{tot}$ 
is bounded as follows:
\[
K \le N_{tot} \le K \times E_{tot}
\]
where $E_{tot}$ is the number of eras executed and assuming that at
least $K$ evaluation should be performed in order to obtain the first
Pareto $\mathscr{P}_0$. 


\subsection{On the parallelizability of PS}
While the detailed analysis and the evaluation of a parallel
implementation of the proposed algorithm is not in the scope of this paper, we
provide in this subsection some arguments to show different strategies
that could be applied to make PS execution parallelizable.

Following the approach used in the ``Master-Slaves parallel genetic
algorithms''~\cite{Cantu-Paz98asurvey,Borkar14parallel}, we can
distinguish a master and different slave processing nodes. At each era
the master assigns a set of regions to each slave, which, in turn,
independently performs the analysis and calculates the innovation
score of the assigned regions. At the end of each era, the master
collects the results from the slaves, performs the splitting/merging
operations and the next era starts.

Another possible approach is closer to the ``Island scheme'' for GA
parallelization~\cite{Borkar14parallel}. The configuration space is
initially divided in $n$ macro-regions, each one assigned to a
different processing node, which autonomously performs all the PS
steps. When merging and splitting, each processing node consider the assigned
macro-region as the entire parameter space. Note that, while the
Island scheme for GA parallelization requires a migration operator,
that is not easy to properly tune, we do not have special
requirements. On the other hand, one of the macro-regions may begin to
provide very low innovation at a certain era. In this case, dedicating
to it the same number $K$ of test functions as to the other macro-regions
is inefficient. To solve this problem, the number of test functions
assigned to each macro-region should be periodically renegotiated, in
favor of the macro-regions providing most innovation.

Opposite to the ``Island  scheme'', a novel parallelization technique
permitted by PS can consist of different instances of PS, each one
running on the whole configuration space but having its own
configuration space partition that separately evolves. Each PS
instance independently performs the zoom-out and zoom-in on
configuration space, splitting and merging its own regions. All PS
instances share the indication of the configurations already tested,
so that a configuration already tested by some PS instance, will not
be tested again by some other PS instance. In addition, PS instances
share the set of non-dominated configurations. Therefore, when a PS
instance evaluates the innovation score of its regions (steps
\ref{pers02.enu:novelty_score_of_a_configuration} and
\ref{enu:region_innovation_score} of \secR{steps}), it considers also the
solutions found by the other PS instances. In this way, the
configuration space is spanned in parallel by multiple PS instances,
that, although autonomous, are not isolated; on the contrary, they
continuously exchange information and influence each other, thanks to
the sharing of the the set of already simulated configurations and the set of non-dominated
ones. Otherwise stated, all the PS instances (independently but
jointly) contribute to the formation of the Pareto-set.

We defer the evaluation of these different proposals to our future
research, the aim of this section being just to show that our
algorithm has the potentiality of being parallelized, thus leaving
room for performance improvement.
